Partially Observed Markov Decision Process Multiarmed Bandits—Structural Results

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Partially Observed Markov Decision Process Multiarmed Bandits - Structural Results

This paper considers multiarmed bandit problems involving partially observed Markov decision processes (POMDPs). We show how the Gittins index for the optimal scheduling policy can be computed by a value iteration algorithm on each process, thereby considerably simplifying the computational cost. A suboptimal value iteration algorithm based on Lovejoy’s approximation is presented. We then show ...

متن کامل

A Partially Observed Markov Decision Process for Dynamic Pricing

In this paper, we develop a stylized partially observed Markov decision process (POMDP) framework, to study a dynamic pricing problem faced by sellers of fashion-like goods. We consider a retailer that plans to sell a given stock of items during a finite sales season. The objective of the retailer is to dynamically price the product in a way that maximizes expected revenues. Our model brings to...

متن کامل

Robust partially observable Markov decision process

We seek to find the robust policy that maximizes the expected cumulative reward for the worst case when a partially observable Markov decision process (POMDP) has uncertain parameters whose values are only known to be in a given region. We prove that the robust value function, which represents the expected cumulative reward that can be obtained with the robust policy, is convex with respect to ...

متن کامل

The Infinite Partially Observable Markov Decision Process

The Partially Observable Markov Decision Process (POMDP) framework has proven useful in planning domains where agents must balance actions that provide knowledge and actions that provide reward. Unfortunately, most POMDPs are complex structures with a large number of parameters. In many real-world problems, both the structure and the parameters are difficult to specify from domain knowledge alo...

متن کامل

A Value Iteration Algorithm for Partially Observed Markov Decision Process Multi-armed Bandits

A value iteration based algorithm is given for computing the Gittins index of a Partially Observed Markov Decision Process (POMDP) Multi-armed Bandit problem. This problem concerns dynamical allocation of efforts between a number of competing projects of which only one can be worked on at any time period. The active project evolves according to a finite state Markov chain and generates then a r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Mathematics of Operations Research

سال: 2009

ISSN: 0364-765X,1526-5471

DOI: 10.1287/moor.1080.0371